Method for diagnosis of genetic diseases by molecular combing and diagnosis box

ABSTRACT

The invention concerns a method for detecting or locating one or several genes of one or several specific A DNA sequence or one or several molecules reacting with DNA on a B DNA characterised in that it consists in: (a) fixing and combing a certain amount of said B DNA on a combing surface; (b) reacting the product of the B combing with one or several probes, linked with the gene(s) or specific A DNA sequences, or with the molecules capable of reacting with DNA; (c) extracting information corresponding to at least one of the following categories: (1) the position of the probes, (2) the distance between the probes, (3) the size of the probes (the total sum of sizes for quantifying the number of hybridised probes) for determining therefrom the presence, the location and/or the amount of genes or specific A DNA sequences. This method can be used in particular for the diagnosis of genetic diseases.

The present invention relates in particular to a method for detectingand locating polynucleotide sequences (which may contain or otherwisegenes or gene portions) in a genome or a genome portion using theso-called molecular combing technique.

The present invention also relates to a method for detecting andlocating reagents of biological, natural or synthetic origin bycombining said reagents with all or part of the combed DNA.

The technique of molecular combing, as described in the followingreferences: PCT/FR95/00164 of Oct. 2, 1995 and PCT/FR95/00165 of Oct. 2.1995, applied to nucleic acids, and more particularly to genomic DNA,allows the uniform extension and the visualization of DNA or of RNA inthe form of rectilinear and practically aligned filaments.

The present invention is based on the demonstration of the fact that,using probes, that is to say polynucleotides containing a chain ofnucleotide sequences such as labeled DNA molecules which specificallyrecognize portions of the aligned DNA, which are hybridized with thecombed DNA, it is possible to directly visualize, on the combed genome,the position of the complementary sequence.

Under these conditions, it is possible, for example using two probeslabeled with different chromophores such that they can be visualized bya color, red and green for example, to measure the distance separatingthem. However, it is also possible, using different probes or a seriesof contiguous probes (called hereinafter “contig”), to directly measurethe length of the region of interest, and to measure the potentialimpairments thereof in the case of an abnormal genome.

The present invention therefore relates, in particular, to the diagnosisof genetic diseases which are preferably characterized by substantialimpairments of the genome, either in its structure, deletion ortranslocation for example, or in the number of copies of certainsequences (trisomy for example, where the sequence represents the wholeof a chromosome), as well as to methods which allow genes to be locatedand mapped rapidly.

Genetic diagnosis may be divided into several fields:

prenatal,

pathologies with a genetic component,

cancer and susceptibility to cancer.

Prenatal Diagnosis

The majority (95%) of fetal abnormalities are due to trisomies ofchromosomes 21, 18, 13, X or Y. Their conclusive diagnosis is somewhatlate (17th week of amenorrhea, by amniocentesis for example).Aminiocentesis requires a substantial puncture of amniotic fluid (a fewtenths of milliliters) from which fetal cells in suspension areextracted and cultured for several days (see the technique described byS. Mercier and J. L. Bresson (1995) Ann. Génét., 38, 151-157). Akaryotype of these cells is established by macroscopic observation andcounting the chromosomes by a highly specialized staff.

A technique involving the collection of chorial villi makes it possibleto dispense with the culturing step and avoids the collection ofamniotic fluid. Karyotype analysis requires, however, the same work (seeMédecine Prénatale. Biologie Clinique du Foetus. André Boué, PublisherFlammarion, 1989). These two techniques may be applied earlier (up to 7weeks of gestation for the collection of chorial villi and 13-14 weeksfor aminiocentesis), but with a slightly increased risk of abortion.Finally, a direct collection of fetal blood at the level of theumbilical cord allows karyotyping without culturing, but presupposes ateam of clinicians specialized in this technique (C. Donner et al.,1996, Fetal Diagn. Ther., 10, 192-199).

Other abnormalities such as translocations or deletions/insertions ofsubstantial portions of chromosomes may be detected at this stage, or byusing techniques such as fluorescent in situ hybridization (FISH).However, here again, this type of diagnosis can only be carried out by ahighly qualified staff.

Studies show, moreover, that there are as yet no immunological methodsallowing the detection of fetal markers in maternal blood allowing aconclusive diagnosis of trisomy 21 or of other abnormalities (see, forexample, N. J. Wald et al., 1996, Br. J. Obstet. Gynaecol., 103, 407-412for trisomy 21—related Down's syndrome).

The current prenatal diagnoses therefore have numerous disadvantages:they can only be carried out at a relatively late stage of thedevelopment of the embryo; they are not completely without risk for thefetus or for the mother; the results are often obtained after a fairlylong time (about 1 to 3 weeks depending on the technique) and they arecostly. Finally, a number of chromosomal abnormalities go undetected.

Diagnosis of Pathologies with a Genetic Component

Many diseases have a recognized genetic component (diabetes,hypertension, obesity and the like) which is the result of deletions,insertions and/or chromosomal rearrangements of variable sizes. Theculturing of cells does not pose any problem at this stage, but the FISHtechniques, which are described by G. D. Lichter et al. (1993),Genomics, 16, 320-324; B. Brandritt, et al., (1991), Conomics, 10, 75-82and G. Van den Hengh et al., (1992), Science, 257, 1410-1412) have alimited resolution and require a highly qualified staff, making thesetests barely accessible.

The development of a more effective and inexpensive test would allow thegeneral adoption of suitable therapies, at an early stage of thepathologies involved, likely to improve their remission.

Cancer Diagnosis

Among the pathologies with a genetic component, cancerous conditionsconstitute a major class affecting an increasing proportion of thepopulation. Current understanding of the process of the onset of acancerous condition involves a step of proliferation of proto-oncogenes(mutations in the genome of the cells) which precedes the transformationof the cell to a cancerous cell. This proliferation step isunfortunately not detectable, whereas the possibility of carrying out atreatment at this stage would certainly increase the chances ofremission and would reduce the patients' handicap.

Finally, a number of tumors are characterized by chromosomalrearrangements such as translocations, deletions, partial or completetrisomies, and the like.

In each of these fields, molecular combing can provide a majorcontribution, either by the speed and the small quantity of biologicalmaterial needed, or by the quantitative accuracy of the results.

The importance of the technique appears most particularly in the casewhere the genetic material is obtained from cells which are no longerdividing or which cannot be cultured, or even from dead cells in whichthe DNA is not significantly degraded.

In the case of prenatal diagnosis, such is the case after extraction offetal cells circulating in the maternal blood (Cheung et al., 1996,Nature Genetics 14, 264-268). The same applies in the case of cancerouscells obtained from certain tumors.

Molecular combing makes it possible to improve the possibilities ofdiagnosis of genetic diseases, but it may also allow the study andidentification of the genomic sequences responsible for said diseases.Moreover, currently, the development of a diagnostic “kit” or box startswith the search for the gene involved in the pathology.

The search for genes involved in pathologies (human or other) isnowadays generally carried out in several steps:

(i) Establishment of a target population of individuals effected by thepathologies, of their descendants, ascendants and collaterals, andcollection of blood and/or cell samples for the purpose of storinggenetic material (in the form of DNA or of cellular strains).

(ii) Genetic location by analysis of probability of cosegregation withgenetic markers (linkage analysis). At this stage of the study, a fewclose markers located on one or more given chromosomes are availablewhich make it possible to proceed to the step of physical location.

(iii) Physical mapping: starting with the genetic markers obtained inthe preceding step, a screening of libraries of human DNA clones (YACs,BACs, cosmids and the like) specific for the region(s) determined in thepreceding step is carried out. A number of clones containing thepreceding markers are thus obtained. The region of interest may then beprecisely mapped using clones of decreasing size. Cloning of the genomeportion considered may also be carried out again using the human DNA.

(iv) Search for the gene: several techniques may be used at this stage:exon “trapping” (use of cDNA libraries (complementary DNAs obtained frommessenger RNA)) CpG islands, preservation of interspecific sequences,and the like, which make it possible to assign a coding sequence to one(or more) of the clones selected in the preceding steps.

This strategy as a whole represents a major work (up to several yearspossibly). Consequently, any technique which makes it possible to arrivemore quickly at step (iv) constitutes an advantage for the search forgenes, but also in diagnosis.

In the current state of the art, when the gene has been located, forexample by the preceding method, its detection is in general carried outusing specific probes corresponding to the sequence in question, thelatter being amplified by methods of the PCR type for example or the LCRtype (as described in patent EP 0 439 182) or a technique of the NASBAtype (kit marketed by the company Organon Teknika).

However, amplification techniques are not completely satisfactory,especially for heterozygotes, since a normal copy of the gene exists inthe genome, as well as in the case of a large deletion or of diseasesinvolving repetitive sequences where PCR is not satisfactory either.

The diagnosis of a large number of genetic diseases can now be envisagedusing molecular combing and labeling of DNA.

Molecular combing is a technique which consists in anchoring DNAmolecules by their ends to surfaces under well defined physicochemicalconditions, followed by their stretching with the aid of a recedingmeniscus; DNA molecules aligned in a parallel manner are thus obtained.The purified DNA used may be of any size, and therefore in particulargenomic DNA extracted from human cells. The genome may also be obtainedfrom a genomic material containing at least 80% genetic material offetal origin.

The DNA molecules thus combed may be denatured before being hybridizedwith nucleic acid probes labeled by any appropriate means, (inparticular with biotin-dUTP or digoxygenin-dUTP nucleotides), which arethen revealed, for example, with the aid of fluorescent antibodysystems.

Given that molecular combing is characterized by a constant extension ofthe combed molecules, the measurement of the lengths of the fluorescentfragments observed with the aid of an epifluorescence microscope (forexample) therefore directly gives the size of the hybridized probefragments.

The degree of extension depends on the type of surface, but can beprecisely measured; it is for example 2 kilobases (kb) per micrometer(μm) in the case of surfaces silanized according to the protocoldescribed in reference (1) and used in the examples.

When necessary, it is possible to provide for an internal standard, thatis to say a so-called calibrating DNA of known length which will make itpossible to calibrate the operation, that is to say to calibrate eachmeasurement.

The present invention, which includes various embodiments, relatesessentially to a method for detecting the presence or the location ofone or more genes or of one or more sequences of specific A DNA or ofone or more molecules reacting with the DNA on a B DNA, characterized inthat:

(a) a certain quantity of said B DNA is attached to and combed on acombing surface,

(b) the B combing product is reacted with one or more labeled probes,bound to the gene(s) or to the sequences of specific A DNA(s) or to themolecules capable of reacting with the DNA,

(c) the information corresponding to at least one of the followingcategories is extracted:

(1) the position of the probes,

(2) the distance between probes,

(3) the size of the probes (the total sum of the sizes which make itpossible to quantify the number of hybridized probes)

so as to deduce therefrom the presence, the location and/or the quantityof the genes or of the sequences of specific A DNA.

In the present description, the combing technology refers to thetechnology described in the documents mentioned above, likewise thenotion of “combing surface” which corresponds to a treated surfaceallowing anchorage of the DNA and its stretching by a receding meniscus.

It should be noted that the combing surface is preferably a flat surfaceon which readings are easier.

“Reaction between the labeled probes and the combed DNA” is understoodto mean any chemical or biochemical reaction, in particularimmunological type reactions (for example antibody directed againstmethylated DNA), protein/DNA or nucleic acid/DNA (for examplehybridization between complementary segments) or nucleic acid/RNA, ornucleic acid/RNA-DNA hybrid reactions. There may also be mentioned, asexamples, DNA—DNA chemical binding reactions using molecules of psoralenor reactions for polymerization of DNA with the aid of a polymeraseenzyme.

The hybridization is generally preceded by denaturation of the attachedand combed DNA; this technique is known and will not be described indetail.

“Probe” is understood to designate both a mono- or double-strandedpolynucleotide, containing at least 20 synthetic nucleotides or agenomic DNA fragment, and a “contig”, that is to say a set of probeswhich are contiguous or which overlap and covers the region in question,or several separate probes, labeled or otherwise. “Probe” is alsounderstood to mean any molecule bound covalently or otherwise to atleast one of the preceding entities, or any natural or syntheticbiological molecule which may react with the DNA, the meaning given tothe term “reaction” having been specified above, or any molecule boundcovalently or otherwise to any molecule which may react with the DNA.

In general, the probes may be identified by any appropriate method; theymay be in particular labeled probes or alternatively nonlabeled probeswhose presence will be detected by appropriate means. Thus, in the casewhere the probes were labeled with methylated cytosines, they could berevealed, after reaction with the product of the combing, by fluorescentantibodies directed against these methylated cytosines. The elementsensuring the labeling may be radioactive but will preferably be coldlabelings, by fluorescence for example. They may also be nucleotideprobes in which some atoms are replaced.

The size of the probes may be understood to be of any value measuredwith an extensive unit, that is to say such that the size of two probesis equal to the sum of the sizes of the probes taken separately. Anexample is given by the length, but a fluorescence intensity may forexample be used. The length of the probes used is between for example 5kb and 40-50 kb, but it may also consist of the entire combed genome.

Advantageously, in the method in accordance with the invention, at leastone of the probes is a product of therapeutic interest which is capableof interacting with the DNA. Preferably, the reaction of the probe withthe combed DNA is modulated by one or more molecules, solvents or otherrelevant parameters.

Finally, in general, “genome” will be used in the text which follows; itshould be clearly understood that this is a simplification; any DNA ornucleic acid sequence capable of being attached to a combing surface isincluded in this terminology.

In addition, the term “gere” will sometimes be used indiscriminately todesignate a “gene portion” of genomic origin or alternatively a specificsynthetic “polynucleotide sequence”.

In a first embodiment, the method according to the invention is used toallow the screening of breaks in a genome, as well as for the positionalcloning of such breaks. It should be noted that the term “break” coversa large number of local modifications of the genome of which the listwill be explicitly stated later.

The method according to the present invention consists in determiningthe position of the potential break points involved in a pathology ofgenetic origin by hybridization, to combed genomic DNA of patientssuffering from said pathology, of a genomic probe of known size (clonedor otherwise) situated in the region of the desired gene. These breakpoints consist of points in the genetic sequence whose surroundingschange over several kilobases (kb) between a healthy individual and adiseased individual.

The principle of the definition of the break point is based on thepossibility of detecting, by molecular combing, a local modification ofthe genome studied compared with a genome which has already beenstudied, at the level of the region(s) considered.

The development of methods for picking out local modifications of thegenome of less than 1 kb in size can thus be envisaged with the aid ofclose-field observation techniques (AFM, STM, SNOM, and the like) ortechniques having an intrinsically higher resolution (for example goldnanobead electron microscopy).

More particularly, the present invention relates to a method foridentifying a genetic abnormality of a break in a genome, characterizedin that:

(a) a certain quantity of said genome is attached to and combed on acombing surface,

(b) the combing product is hybridized with one or more labeled specificprobes corresponding to the genomic sequence for which the abnormalityis sought,

(c) the size of the fragments corresponding to the hybridization signalsand optionally their repetition are measured, and

(d) the presence of a break is deduced therefrom either by directmeasurement or by comparison with a standard corresponding to a controllength.

By way of illustration, the measurement of the size of the fragmentsleads to a histogram, that is to say a graphical representation of thelengths of the fragments observed.

In order to produce a histogram of the probe, the number of cloneshaving a defined probe length is evaluated. In principle, the histogramcontains only one or two peaks depending on the type of break analyzed,two peaks when the probe hybridizes as two separate fragments and asingle peak when it hybridizes as a single fragment.

In the case of a heterozygous genome, in which one of the alleles isnormal for the region considered, the signature of the normal allele(the absence of a break) is superposed on that of the abnormal allele,but can be extracted because of the fact that it is known.

This method can also be used to carry out positional cloning, that is tosay to determine the position of one or more unknown genes involved in apathology. The principle consists, as before, in hybridizing clones ofhuman or animal or plant DNA, serving in this case as probe, to thecombed genomic DNA of one or more patients suffering from the pathologystudied. The revealing of these hybridizations makes it possible tomeasure the size of the hybridized fragments and to construct ahistogram of the various sizes observed. If the clone used as probecovers a break point of the gene, the signature of this phenomenon willbe legible on the length (shorter) of the hybridized fragment.

The use of a limited number of clones specific for the implicated regionwhich may have been deduced by genetic linkage analysis will thus allowa rapid and precise determination of the position of a break point, of adeletion or of any other genetic rearrangement of sufficient size to beresolved by the detection technique combined with the molecular combing.

Obviously in this case, the break is searched out in order to map it; inthe diagnosis, the break is known; it is its presence or its absencewhich is searched out.

Two possibilities may exist (on the assumption that a break point existsin the region of the genome involved in the pathology):

(i) the probe does not overlap the break point,

(ii) the probe overlaps the break point.

In case (i), the measurement of the lengths of the fluorescent probes iscomparable to that which would be obtained with the same probehybridized to a nonpathogenic genomic DNA of the same nature (that is tosay essentially of the same size and prepared under the sameconditions).

In case (ii), on the other hand, the probe being systematicallyhybridized to two separate pieces (or more) in the combed genomic DNA(by definition of the existence of a break point), the measurement ofthe lengths of the hybridized fluorescent probes is different from theresult obtained by hybridization to a non pathological genomic DNA.Moreover, the size of the fragments hybridized to the pathogenic DNAmakes it possible to estimate the position of the break point in theclone with a precision of a few kb, or even more, if a more resolutivetechnique is used.

Because of this, only the search for the gene in this clone nowtherefore remains. Basically, it will involve repeating thesemeasurements for all the clones which are likely to partially cover theregion corresponding to the gene. The number of hybridization slides maybe reduced by simultaneously hybridizing several differently labeledprobes, or by using a method of coding by combination of colors, as willbe described below.

This technique makes it possible to determine the position of thepotential break points of the region, of the genome, involved in agenetic pathology by hybridization of cloned genomic DNA to combedgenomic DNA obtained from patients. This technique therefore applies tothe search for regions of the genome which are responsible forpathologies due to:

the deletion of a portion or of the whole of this region of the genome,

the translocation of all or part of this region of the genome,

the duplication or presence of several copies of all or part of thisregion of the genome inside it or at any other site of the genome,

the insertion of any genetic sequence inside this region of the genome.

In a second embodiment and in some specific cases, in particular whenthe genetic abnormality searched out contains major deletions orduplications (in the case of trisomies for example), the method which isthe subject of the present invention may be modified since it theninvolves assaying the genes or a particular sequence.

More particularly, the present invention relates to a method forassaying a given genomic sequence in a genome, characterized in that:

(a) a certain quantity of said genome is attached to and combed on acombing surface,

(b) the combing product is hybridized with a labeled control probe oflength lt corresponding to a so-called control genomic sequence, that isto say whose copy number in the genome is known, and with a labeledspecific probe of length lc corresponding to the genomic sequence to beassayed, such that said probes may be identified separately,

(c) the total length of the hybridization signals for the two probes,that is to say Lc and Lt, is then measured,

(d) the copy number of the corresponding sequence is calculated for eachby the ratio${Nt} = {{\frac{Lt}{lt}\quad {and}\quad {Nc}} = \frac{Lc}{lc}}$

 and the copy number of the sequence to be assayed relative to thecontrol sequence is deduced therefrom.

In the case of the prenatal diagnosis of trisomy 21, the method mayconsist in the hybridization of a cosmid probe specific for a controlchromosome (chromosome 1, for example probe of length lt), labeled withbiotinylated nucleotides, and the hybridization of a cosmid probespecific for chromosome 21 (probe of length lc), labeled withdigoxygenin to combed genomic DNA extracted from amniotic samples, orfrom any other sample containing cells of fetal origin.

For example, it will be possible to use an avidin-Texas Red (red color)revealing system for the control probe and an antidigoxygenin-FITC(green color) revealing system for the specific probe: the total lengthof the red hybridization signals observed in a given region of thesurface, LT, and the total length of the green hybridization signalsobserved in the same region, or in an equivalent region of the surface,LC, therefore lead to the numbers Nt and Nc defined above.

The ratio Nc/Nt of close to 1 will indicate a normal genotype (2chromosomes 21 for 2 chromosomes 1), whereas a ratio of close to 1.5will indicate a trisomic genotype (3 chromosomes 21 for 2 chromosomes1).

In general, a significant difference between Nc and the value expectedfor the number of genomes present which is deduced from Nt is theindication of the presence of a gene abnormality.

In the case of the screening of oncogenes or proto-oncogenes, the samemethod may be used: a control probe will be hybridized and revealed inred for example and a probe corresponding to the gene or to a portion ofthe gene searched out will be hybridized and revealed in green forexample. After the measurements carried out as above, the Nc/Nt ratiowill give the relative abundance of the gene compared with the frequencyof two copies per diploid genome.

The aberrant methylation of the GpC islands which is frequently observedin many cancers (92% of colon cancers) can also be detected by themethod according to the invention by reaction between the combed DNA andfluorescent antibodies directed against the methylated cytosines.

Indeed, the loss of the heterozygosity on chromosone 9p21 is one of thegenetic impairments most frequently identified in human cancers. Thetumor suppresser gene CDKN2/p16/MTS1 located in this region isfrequently inactivated in many human cancers by homozygous deletion.However, another mode of inactivation has been reported which involvesthe loss of the transcription associated with a de novo methylation ofGpC 5′ islands of CDKN2/p16 in lung cancers, gliomas and carcinomas withdesquamation of the head and of the neck. These aberrant methylations ofthe GpC islands also frequently occur in breast (33%), prostate (60%),kidney (23%) and colon (92%) cancer cell lines (J. G. Herman et al.,(1995) Cancer Res., Oct. 15, 55(20); 4525-30; M. M. Wales et al., (1995)Nature Med., Jun., 1(6): 570-607).

The precise location of the methylation areas on a gene is of a verygreat importance for understanding the mechanism of the development ofcancer and for a possible “screening” test. Molecular combing candetect, with an accuracy of a few kb, the location of such GpC islandsinvolved in the development of cancer.

This technique which makes it possible to determine the copy number of agene in a genome can also be used to detect the absence of a portion ofthe genome.

In the case of a pathology characterized by the deletion of asubstantial portion of a chromosome, it is indeed sufficient to take, astarget sequence, a clone contained in the deleted region, and as controlsequence a clone outside this region. It is thereby possible to detectdeletions of the size of a cosmid clone (30-50 kb) or greater.

If a sufficient density of combed molecules is available, it is possibleto envisage detecting smaller deletions (a few kb), corresponding to aportion of the target sequence used. That is the reason why it isparticularly advantageous to place, on the combed surface, at leastabout 10 copies of genome.

The statistical error on the Nc/Nt ratio is of the order of 1/Nc+1/Nt.Advantageously, it is advisable to have, on the combed surface, asufficient number of signals in order to have a statistical error ofless than 20% on the Nc/Nt ratio. It is therefore important to have alarge number of hybridized probes, typically Nc, Nt>100.

However, in practice, it is also possible to increase the accuracy ofthese measurements by using not one but several types of control probesand target probes without necessarily seeking to distinguish betweenthese types of probe, that is to say by revealing all of them in thesame manner.

The possibility of obtaining such a number of signals has beendemonstrated: it is possible to determine about one hundred signals on asilanized glass surface having a useful surface area of 20×20 mm. Thisdensity may be considerably increased as long as a large quantity of DNAis available.

It appears that a sufficient number of genome per surfaces having auseful surface area of 20×20 mm is in the region of 100, when a singleprobe is used. In the case where several probes or larger probes areused, it is possible to envisage being able to reduce either:

the combing surface and therefore the surface analyzed,

the DNA density used, and therefore the number of combed genomes at thesurface.

Depending on the main constraint (speed required, or DNA in a limitedquantity), either of these two routes may be used).

The technique disclosed involves the use of preparation protocols whichare strict but without particular technical difficulties. At the levelof the analysis of the signals, no particular qualification isnecessary, thereby making the technique generalizable to alllaboratories possessing staff with minimal competence in molecularbiology.

A few hundreds of thousands of cells should in principle be sufficientto prepare a genomic DNA solution leading to a high density of combedmolecules on the surfaces for analysis. It is therefore in principle nolonger necessary to carry out cell cultures in most cases. It shouldtherefore be possible to carry out the sampling-analysis as a wholewithin a few days.

The simplicity of the signals to be analyzed (which are parallel anddistinct from the hybridization background noise) makes it possible toenvisage complete automation of the process of analyzing the signals(scan of the surfaces, acquisition and processing of the measurements).Integration with a system for storing surfaces corresponding to variouspatients makes it possible to envisage high yields, giving thepossibility of providing various types of diagnosis within a few days.

The method described above can allow various types of diagnosis:chromosome counting (trisomy, monosomy, and the like), counting of thecopies of a gene, detection of known deletions, or other chromosomalmodifications resulting in a modification of the hybridized length of agiven genomic probe per genome.

It should be noted that it is also possible to detect a partial deletionon a single allele.

It is likewise possible to carry out the hybridization of clones onseveral different genomes combed on the same surface. For example, thesimultaneous combing of the genome of a principal organism and of thegenome of host organisms (parasites, bacteria, viruses and the like) andthe use of specific probes, on the one hand from the principal organismand, on the other hand from host organisms, makes it possible inprinciple to determine the ratio number of hosts/number of cells of theprincipal organism. In the case of an organism infected by a virus, thisallows the measurement of the viral load. The figures cited aboveprobably limit the sensitivity of this method of diagnosis to situationswhere more than one infectious organism exists per 100 host cellsapproximately.

These various types of diagnosis may be combined by virtue of the use ofmultiple revealing systems (several colors, or combination of colors),or any other method allowing the distinction between the hybridizationsignals obtained from distinct probes and intended for a precisediagnosis.

The present invention also relates to diagnostic “kits” containing atleast one of the following components:

a combing surface,

probes which are labeled or which are intended to be labeled,corresponding to the abnormalities to be detected,

a device allowing the combing of the DNA,

a control genome and/or control probes, said genome being optionallyattached to the surface to be combed,

one or more specific results obtained using the preceding protocols inone or more control situations, so as to provide a grid for theinterpretation of the results obtained in the diagnoses carried out, forexample in the form of an expert system (software for example),

an expert system which makes it possible to facilitate the carrying outof diagnoses according to the method of the invention.

The principle of the technique being based on the combing of thepatient's DNA, this DNA preparation step requires protocols forextraction, combing and the corresponding material (treated surfaces,molecular combing apparatus).

The subject of the invention is also a genomic DNA or a portion ofgenomic DNA capable of reacting, under molecular combing conditions,with a probe corresponding to a product of transcription or oftranslation or of regulation.

The diagnosis itself requires the hybridization of specific nucleotideprobes and the revealing of these probes, for example by antibodysystems. Given that a color coding can, in addition be carried out inthe case of combined diagnoses, it is therefore possible to providebatches of prelabeled probes corresponding to a catalog of particulardiagnoses.

Given that the analysis requires the measurement of the length of thesignals or more generally of one of the three categories of informationdescribed above, a system for analyzing these signals (software andautomated equipment) also forms part of this invention.

In a third embodiment, the present invention relates to a methodallowing in particular the physical mapping of a genome.

The aim of physical mapping being the ordering of a clone within agenome, molecular combing naturally applies to this objective, by simplehybridization of the clones on the combed genome (for example, in thecase of a YAC, the whole yeast genome may be combed, so as to dispensewith the separation of the artificial chromosome from the naturalchromosomes of the yeast).

The position of the clones is obtained by direct measurement of theirdistance to a reference clone, or to any other reference hybridizationsignal on the combed genome. The constant extension of the combed DNAthen makes it possible to directly establish in kilobases (kb) therespective position of the clones as well as their size, when the latterexceeds the resolution of the method. In particular, in the case ofconventional epifluorescence microscopy, whose resolution is half awavelength, the precise mapping of cDNA (DNA complementary to the RNAstranscribed in the cell) is possible, but without the possibility ofaccurately measuring the size of the hybridized fragments (exons), whichis of the order of a few hundreds of bases in general. However, usingthis method, the precise location within the genomic DNA of completecDNA fragments or of their fragments may be obtained. For example, it ispossible to determine the presence or the absence and the position ofthe cDNAs corresponding to a protein of interest by hybridization of thecDNAs to the genomic DNA or to a genomic DNA clone (cosmid, BAC, YAC,for example) at the same time as a clone serving as a reference mark.

The use of multiple fragments obtained from a cDNA leads to the pickingout of the presence or otherwise of one or more genomic DNAs in a vectorof the cosmid or YAC type for example.

The use of more resolutive methods may allow an additional measurementof the size of the probes (close field, electron microscopy, and thelike), but a measurement of the intensity of fluorescence, if it is themode of observation chosen, may also provide this information.

The method which we are providing makes it possible to minimize thenumber of hybridizations necessary for ordering a given number ofclones, given a fixed number of colors for revealing the hybridizations,or (more generally) of distinct modes of revealing the hybridizations.

The invention relates to a method, characterized in that:

(a) a certain quantity of said genome is attached to and combed on acombing surface,

(b) the combing product is hybridized with probes labeled withradioactive or fluorescent elements and the like, such as beads,particles and the like, corresponding to each clone, such that saidprobes may be specifically revealed by a color in particular,

(c) the information corresponding to the position of each clone as wellas the sizes and the corresponding distances on the genome areextracted,

(d) operations b) and c) are repeated n times by modifying the color,the labeling or the mode of revealing the probes, in the knowledge thatwith p colors, labelings or different modes of revealing, it is possibleto position p^(n) clones after n hybridizations.

In the context of the standard methods of mapping, the number I ofhybridizations necessary to map N clones with the aid of p labelings,colors or modes of revealing, increases linearly with the number ofclones N.

Thus, with 3 colours it is nowadays necessary to carry out at least 15hybridizations of the preceding type in order to map 30 clones.

This number of hybridizations is high, and the number of availablecolors is in practice limited (even using combinations of fluorophores).Moreover, once all the measurements have been carried out, it isnecessary to carry out the selection of the various possible positionsof the clones, which may not always be easy, if the measurement errorsare taken into account.

The method provided here makes it possible to map a number of cloneswhich increases exponentially with the number of hybridizations carriedout.

By way of illustration, the diagram in FIG. 10 represents the result oftwo hybridizations of 4 clones revealed with two different colors fromone hybridization to another (for half of them). The 4 hybridized clonesform a differently colored canvas from one hybridization to another, itbeing possible to pick out the clones via their coding (binary in thiscase) . In this example of 4 clones, a code composed of a succession of2 colors is sufficient to distinguish each clone:

A=Red then Red

B=Green then Green

C=Red then Green

D=Green then Red.

From 5 to 8 clones, 3 hybridizations will be necessary, in order todistinguish between the clones by a succession of 3 colors.

More generally, using p colors, to map N clones, a number ofhybridizations I such that: N=p^(I) will be sufficient.

In comparison with the standard method, 30 clones may be mapped in 5hybridizations (instead of 30) if only 2 colors are available, and inonly 4 hybridizations (instead of 15) if 3 colors are available.

The mapping principle presented here is simple. However, in order toovercome certain possible experimental artefacts (dispersion of thesizes of the signals, variability of saturation, break in the moleculesand the like), suitable software for processing images and forstatistical analysis will be advantageously used.

The examples below will make it possible to better understand othercharacteristics and advantages of the present invention.

Finally, in a fourth embodiment, the present invention relates to amethod allowing in particular the detection or the location of productscapable of reacting with the combed DNA. For example, proteins forregulating the transcription of a DNA-binding gene during the cell cycleor otherwise may be detected on combed DNA, and their preferred bindingsites determined relative to the position of sequences which are knownand which have been picked out, for example, according to the precedingmethod for carrying out the invention.

In a similar manner, molecules of therapeutic interest which are capableof reacting with DNA may be detected on combed DNA; their effect onother molecules capable of reacting with DNA may also be studied bycomparison.

Among the molecules capable of reacting with the combed DNA, thereshould be mentioned regulatory proteins as described by:

Laughon and Matthew (1984), Nature, 310: 25-30 for the regulatoryproteins which attach to the drosophile DNA,

K. Struhl et al. (1987), Cell, 50: 841-846 for regulatory proteins whichbind to DNA in their specific binding domain.

These molecules may also be intercalating agents or molecules whichmodify DNA as described by:

H. Echols et al., (1996), Science, 223: 1050-1056 on the multipleinteractions of DNA inducing, for example, transcriptions;

in the review An. Rev. of Bioch. (1988) , 57,: 159-167, Gross and Ganarddescribe the hypersensitivity of nuclease sites in chromatin;

Hanson et al., (1976), Science, 193: 62-64 describe psoralen as aphotoactive agent in the selective cleavage of nucleotide sequences;

Cartwright et al. (1984), NAS, 10: 5835-5852.

The invention also relates to any molecule, solvent or method linked toa parameter identified by one of the methods described above inaccordance with the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1a: Illustration of the case of a clone (red) overlapping the gene(and therefore the break point) searched out. The hashed regions shouldbe thought of as being absent, which means in particular that when twosegments of probes exist side by side, they in fact form only onesegment having a length equal to the sum of their lengths. Thesubadjacent chromosome(s) is (are) represented by white bars.

(a) Situation in a healthy patient

(b1) Deletion(s) of a portion of the gene. (b2) Deletion of the entiregene.

(c1) and (c2) Translocation of a portion of the gene.

(c3) Translocation of the entire gene.

(d1) Duplication(s) of a portion of the gene (optionally withinversion).

(d2) Duplication(s) of the entire gene (optionally with inversion).

(d3) Repetition(s) of a portion of the gene in another portion of thegenome. (e) Insertion(s) into the gene, of a different portion of thegene (and of the cloned genome).

FIG. 1b: Ideal histograms of the lengths of hybridized clonescorresponding to the preceding situations. In all cases (one or twopeaks), the situation is clearly distinguishable from that of the wholeclone (a).

The histograms of this figure correspond to the signals of the abnormalallele and are therefore obtained after subtraction of the contributiondue to the normal allele, represented in (a), from the crude histograms,represented in FIG. 1c. On the x-axis is the size of the hybridizedfragments, the (arbitrary) scale being the same as that of FIG. 1a. Onthe y-axis is the number of fragments observed, in arbitrary units, onlythe proportions between the various populations being important.

FIG. 1c: Ideal histograms of the lengths of hybridized clonescorresponding to the preceding situations and taking into account thenormal allele. The contribution of the normal allele is represented bythe peak corresponding to the position of the normal case represented in(a). When the abnormal allele contributes towards increasing the valueof this peak (case (d3)), the contribution of the normal allele may ingeneral be considered to be equivalent to the value of an abnormal peak.

FIG. 2a: Illustration of the case of a clone partially overlapping thegene searched out.

(a) Situation in a healthy patient.

(b1) Deletion(s) of a portion of the gene. Another situation was notrepresented, where the deleted portion is fully included in the clone(cf. case corresponding to diagram 1 a).

(b2) Deletion of the entire gene.

(c1) and (c2) Translocation(s) of a portion of the gene. Anothersituation, corresponding to case (c2) of diagram 1 a was notrepresented, where the translocated portion is fully included in theclone.

(c3) Translocation(s) of the entire gene.

(d1) Duplication(s) of a portion of the gene marked by a double-headedarrow (the duplicated portion is fully included in the clone).

(d2) Duplication(s) of a portion of the gene with inversion.

(d3) Duplication(s) of a portion of the gene (the duplicated portion ispartially included in the clone).

(d4) Duplication(s) of the entire gene (without inversion).

(d5) Repetition(s) of a portion of the gene in another portion of thegenome.

(e) Insertion(s) into the gene of a different portion of the gene (andof the genome cloned).

FIG. 2b: Ideal histograms of the lengths of hybridized clonescorresponding to the preceding situations. In all cases (one or twopeaks), the situation is clearly distinguishable from that of the wholeclone (a).

The histograms of this figure correspond to the signals of the abnormalallele and are therefore obtained after subtraction of the contributiondue to the normal allele, represented in (a), from the crude histograms,represented in FIG. 2c. On the x-axis is the size of the hybridizedfragments, the (arbitrary) scale being the same as that of FIG. 2a. Onthe y-axis is the number of fragments observed, in arbitrary units, onlythe proportions between the various populations being important.

FIG. 2c: Ideal histograms of the lengths of hybridized clonescorresponding to the preceding situations and taking into account thenormal allele. The contribution of the normal allele is represented bythe peak corresponding to the position of the abnormal case representedin (a). When the normal allele contributes towards increasing the valueof this peak (cases (d4) and (d5)), the contribution of the normalallele may in general be considered to be equivalent to the value of anabnormal peak.

FIG. 3a: Illustration of the case of a clone completely included in thegene searched out.

(a) Situation in a healthy patient.

(b1) Deletion(s) of a portion of the gene. Another situation was notrepresented, where the deleted portion is fully included in the clone(cf. case corresponding to diagram 1 a).

(b2) Deletion of the entire gene.

(c1) and (c2) Translocation(s) of a portion of the gene. Anothersituation, corresponding to case (c2) of diagram 1 a was notrepresented, where the translocated portion is fully included in theclone.

(c3) Translocation(s) of the entire gene.

(d1) Duplication(s) of a portion of the gene marked by a double-headedarrow (the duplicated portion is fully included in the clone).

(d2) Duplication(s) of a portion of the gene with inversion.

(d3) Duplication(s) of a portion of the gene (the duplicated portion ispartially included in the clone).

(d4) Duplication(s) of the entire gene (without inversion).

(d5) Repetition(s) of a portion of the gene in another portion of thegenome.

(e) Insertion(s) into the gene of a different portion of the gene (andof the genome cloned).

FIG. 3b: Ideal histograms of the lengths of hybridized clonescorresponding to the preceding situations. In all cases (one or twopeaks), the situation is clearly distinguishable from that of the wholeclone (a).

The histograms of this figure correspond to the signals of the abnormalallele and are therefore obtained after subtraction of the contributiondue to the normal allele, represented in (a), from the crude histograms,represented in FIG. 3c. On the x-axis is the size of the hybridizedfragments, the (arbitrary) scale being the same as that of FIG. 3a. Onthe y-axis is the number of fragments observed, in arbitrary units, onlythe proportions between the various populations being important.

FIG. 3c: Ideal histograms of the lengths of hybridized clonescorresponding to the preceding situations and taking into account thenormal allele. The contribution of the normal allele is represented bythe peak corresponding to the position of the normal case represented in(a). When the abnormal allele contributes towards increasing the valueof this peak (cases (c3), (d2), (d3) and (d5)), the contribution of thenormal allele may in general be considered to be equivalent to the valueof an abnormal peak.

FIG. 4: Histograms of the lengths of the step model. Study of theinfluence of the characteristic size.

FIG. 5: Histograms of the lengths of the step model. Study of theinfluence of the break level.

FIG. 6: Histograms of the lengths of the Gaussian model. Study of theinfluence of the characteristic size.

FIG. 7: Histograms of the lengths of the Gaussian model. Study of theinfluence of the break level.

FIG. 8: Simulation of histograms in the case of the hybridization of aBAC (125 kb) on a break point: (a) fragments of 35 and 90 kb; (b)fragments of 50 and 75 kb; (c) fragments of 60 and 65 kb, (d) controlsituation; (e) real histogram of a contig of cosmid clones hybridized tohuman genomic DNA (length of the contig: about 77.5 μm, that is 155 kb).The x-axis represents the size of the fragments in microns. The y-axisrepresents the number of continuous fragments of hybridized probes.

FIG. 9: Illustration of the principle of mapping without color coding.Each line represents the possible location of the new subclone mappedrelative to that with which it is hybridized. The orientation of thefirst two clones is arbitrary. At each stage, two positions on eitherside of the old clone used are possible for the new hybridized clone.

FIG. 10: Illustration of the principle of mapping with color coding.

At each hybridization, the same subclones are hybridized, giving rise tothe same hybridization motif (canvas) if the color information isdisregarded. The two palettes of color which are used make it possibleto pick out without ambiguity each of the subclones (principle ofcoding).

FIG. 11: Total genomic DNA of yeast containing an artificial chromosome(YAC774G4) was combed on treated surfaces. Cosmids belonging to twoadjacent contigs separated by an unknown interval were hybridized inpairs so as to be able to reconstitute an accurate map of theirarrangement. Among the cosmids used, 1F11 contains the majority of thesequence of the gene for calpain 3 whose mutations are responsible forgirdle myopathy type 2A.

The figure shows images characteristic of each hybridization (a cosmidbeing hybridized and revealed in red, on the left, the other in green,on the right) as well as the map reconstituted from the measurements, towithin approximately 3 kb. The scale is given by the 20 kb bar. Thisexperiment made it possible in particular to correct a map of the regionobtained with the aid of STSs (Richard et al., 1995, Mammalian Genome,6, 754-756) and to obtain the chromosomal orientation of the gene.

FIG. 12: Human total genomic DNA was combed on treated surfaces. Cosmidsbelonging to 3 adjacent contigs of the region of chromosome 9 containingthe gene (not cloned) for “Tuberous Sclerosis 1” (STC1) were hybridizedin pairs and revealed in 2 different colors in order to measure the sizeof the intervals separating these contigs. The figure shows 3 imagescharacteristic of each of these measurements giving the distancesbetween cosmids with an accuracy of 1.5 to 3 kb for 50 to 80measurements. The code for the cosmids used is indicated, as well as thecontig to which they belong (in parentheses). The green color isrepresented in white and the red color in gray.

MATERIALS AND METHODS Histogram Method

The diagram in FIG. 1a presents a number of situations in the case of aprobe completely covering the region of the genome searched out (regioncontaining any genomic sequence whose modification causes the appearanceof a pathology). Preferably, the probe covers part of the genome notinvolved in the break, that is to say on either side of the deletion forexample.

The corresponding theoretical histograms (on the assumption that thegenomic DNA combed is intact, which is naturally not very realistic, andwhich will be discussed in the text which follows) are presented in thediagram of FIG. 1b. FIG. 1c represents the crude results beforeextracting the contribution of the normal allele. Similar situations arerepresented in the case where the clone used does not completely coverthe gene (or the region of the genome) searched out on the diagrams ofFIGS. 2a and 3 a.

In the text which follows, the expression “this genome region” (meaninginvolved in the pathology) or “the gene” will be used interchangeablysince the technique strictly speaking does not make it possible todetermine a gene, but a region of the genome which is impaired(resulting in “a break point”).

All the situations represented in the preceding diagrams give an idea ofthe expected difference between a situation where the cloned probe usedhybridizes to a region of the genome of a healthy individual which ismodified in the case of a diseased individual.

In all the cases presented (except that of a translocation of all orpart of the gene containing the entire clone—cf. diagram 3 a (c3)), thehistogram (theoretical) is distinguishable from the “normal” histogram:

either by the presence of a single peak but which is situated at a valuedifferent from the length of the clone (higher—case (d2) for example, orlower—case (b1))

or by the presence of two peaks or more.

The simple data for the histogram of the lengths and of the length Lc ofthe clone (obtained for example with the aid of a hybridization on theDNA of a healthy individual and obtaining of the peak of thecorresponding histogram) therefore makes it possible to know whether theclone studied covers a break point (in the broad sense meant here).

The abnormalities observed at the level of the histogram may be groupedinto several categories:

(a) one peak for a length Li<Lc

(b) one peak for a length Ls>Lc

(c) several peaks shorter than Lc

(d) several peaks longer than Lc

(e) one peak for a length L=Lc and one or more peaks shorter than Lc

(f) one peak for a length L=Lc and one or more peaks longer than Lc

(g) one or more peaks of lengths<, = and >Lc.

The classes of abnormalities listed above are unfortunately generallynot sufficient to define the underlying genetic abnormality.

The diagram of FIG. 1b indeed shows that it is not possible a priori todistinguish between a partial translocation (c1) or (c2), a completetranslocation (c3) or an insertion in the case of a clone which wouldcover the entire gene. The same applies for the case where the clonewould only partially cover the gene (diagram of FIGS. 2b and 3 b).

On the other hand, the data for the lengths corresponding to the peaksof the histogram makes it possible to specify in some cases theposition, in the clone, of the end of the portion of the genome deleted,translocated, duplicated or in contact with an insertion.

In any case, the measurement of a histogram different from the oneexpected in the case of a normal genome indicates the presence, in thegenome studied, of an abnormality at the level of the sequence containedin the clone used.

(a: one peak for a length Li<Le): this almost certainly involves one (ormore) deletions. However, as has been noted, it may involve a case whereseveral peaks are grouped together. We shall see that it is generallypossible to distinguish between these two cases by comparison with aprobe corresponding to a different region of the genome (referenceprobe).

In the case of a deletion, it is possible to deduce therefrom the sizeby subtracting the two remaining lengths from the length of the intactclone (only in the case where the clone covers the entire deletedportion of the gene). The position relative to the clone cannot howeverbe known, given that only additional hybridizations with other clones orsubclones can allow this determination.

However, the same type of histogram may correspond to several deletions,in which case it is only possible to measure the total size of thedeletions (still on the assumption that the clone would cover the entiredeleted portion of the gene).

In the case where the clone only partially covers the gene, the deletiondetected may represent only a portion of the effective deletion of thegene (cf. diagram 2 a (b1) for example).

(b: one peak for a length Ls>Lc) : in this case, one or moreduplications inside the gene or at its ends, of a portion or of thewhole of said gene, are involved. The clone may completely or partiallycover the gene. There may optionally be inversion of the sequences, butthis cannot be detected in the case where the clone completely coversthe gene. The position of the duplication(s) cannot be determined.

(c: several peaks shorter than Lc): partial translocations of the geneor the insertion of a sequence inside the gene may be involved in thiscase. Assuming a translocation, it is possible to specify the two endsof the gene within the clone (case where the clone completely covers thegene) , or to put forward two possibilities for these ends (in the casewhere the clone partially covers the gene).

(d: several peaks longer than Lc): several duplications of all or partof the gene in the region of the genome covered by the clone may beinvolved in this case.

(e: one peak for a length L=Lc and one or more peaks shorter than Lc):one (or more) duplications of all or part of the gene in a region of thegenome different from the region covered by the clone may be involved,this in the case where the clone covers the whole or a portion of thegene.

(f: one peak for a length L=Lc and one or more peaks longer than Lc):although this situation was not represented in diagrams 1 to 3, itinvolves a case of duplication of the (d) type associated with a normalgene.

(g: one or more peaks of lengths <, = and >Lc): combinations of (e) and(f).

In a number of situations, the analysis of the hybridization signals andof their distance, in the case where they would be aligned, shouldmoreover provide additional information.

The example in diagram 1 a (e) will illustrate this idea: in this caseof an insertion, it is necessary to imagine that any segment which isnot hybridized, but which is surrounded by two colinear hybridizations(of any size), is capable of representing an insertion: a histogram ofthese measurements will therefore be constructed; if an insertion reallyexists, it should therefore appear as a peak, in the same way as anyhybridization.

This measurement will also serve in the case of deletions or oftranslocations, for example.

The technique disclosed above assumes that the hybridized combed geneticDNA is intact. Indeed, the reference situation, that of a healthyindividual whose DNA corresponding to the clone used for thehybridization is not modified, is theoretically represented by a peak atthe value of the length of the clone.

In practice, the DNA is subjected to hydromechanical stresses during itspreparation, and also during the combing, which leads to numerous randombreaks. These random breaks may occur within the target sequence for theprobe, which will give rise, after combing, to a separation into severalpieces of said sequence. During the hybridization with the probe, thesesegments will be hybridized, and after revealing, measured as fragmentswhose size is less than that of the target sequence.

These breaks will not necessarily affect all the target sequences in allthe genomes used for the combing. Depending on the number of humangenomes combed on the surface studied, and depending on the preparationconditions, the proportion of target sequences broken will remainreasonably small. In this case, the ideal histogram consisting of asingle peak at the length of the probe will be replaced by a smallerpeak accompanied by a distribution tail at lower lengths. Experience andbasic practical considerations show that an optimum number of combedgenomes per surface studied is at least approximately ten per combedsurface under the experimental conditions described.

In the text which follows, simulations of break processes (following thepreparation and the combing of the DNA), depending on two parameters,which make it possible to evaluate the influence, on the theoreticalhistogram, of the lengths of these phenomena, will be studied.

Two parameters appear to be important in the break phenomenon: thecharacteristic size and the break level.

The characteristic size defines approximately the size limit reached bymolecules subjected to a large number of manipulations. One example ofpossible manipulation is pipetting: depending on the diameter of thecone of the pipette, the maximum size limit of the molecules will belarger or smaller. Another example of manipulation is the vortexing ofthe DNA solution.

The break level is thought to represent a model of the number ofmanipulations: for example, the higher the number of pipetting, thegreater the risk of the molecules being broken.

A. Step Model

The easiest way of representing a model of the manipulations of DNA insolution consists in attributing a zero probability of a break to thefragments having a size less than the characteristic size L_(o) and aprobability of a break equal to 1 for the fragments having a sizegreater than L_(o). The break in a fragment having a size L is made, inaddition, randomly into any two fragments:

L<L_(o)=>P_(o)(L)=0

L>L_(o)=>P_(o)(L)=1

In order to introduce a break level into this model, a parameter setsthe number of break processes to which a molecule and its successivefragments are subjected.

This model makes it possible to simulate the expected histograms duringa hybridization of a probe of known length L_(c) as a function of theparameters of the model. The conditions chosen are: probe hybridizingwith a chromosome of 50 Mb, 200 combed genomes, L_(c)=50 kb.

A.1. Influence of the Characteristic Size on the Histogram

FIG. 4 represents, for various break levels (or steps) , the shape ofthe theoretical histograms obtained for characteristic size values(L_(o), L_(break)) of 1, 2, 4 or 8 times the size of the probe. Itappears clearly that the closer the characteristic size to the size ofthe probe, the more the histogram becomes differentiated from the singlepeak described in part 2, the tail of the distribution at lengthsshorter than the length L_(c) of the probe becoming greater, until itpractically causes the peak to disappear for the expected value L_(c).

A.2. Influence of the Break Level on the Histogram

FIG. 5 presents, for different characteristic sizes, the variation ofthe theoretical histogram as a function of the break level, that is tosay the number of random break steps (N time steps). It appearsnaturally that the peak at the expected length L_(c) decreases when thebreak length increases.

In this model, however, the peak does not disappear completely, evenwhen the break level increases indefinitely, since no break takes placewhen all the remaining fragments have a size less than L_(c).

A.3. Advantage of the Model

The main advantage of the preceding model is to provide a rapid meansfor evaluating the probable shape of the histogram of the lengths of theprobe fragments hybridized, so as to be able to interpret the results ofthe observations.

B. Gaussian Model

A slightly more realistic model consists in a similar model ofsuccessive break steps, but with a break law different from the“all-or-nothing” law of the step model. A simple way of “rounding off”this law is to replace the preceding probability by a Gaussian break lawof probability:

P_(G)(L)=1−e^(−(L/Lo)2)

B.1. Influence of the Characteristic Size on the Histogram

FIG. 6 represents, for various break levels, the shape of thetheoretical histograms obtained for characteristic size values (L_(o),L_(break)) of 1, 2, 4 or 8 times the size of the probe, which areidentical to those used in the simulation carried out in the context ofthe step model. Likewise, the closer the characteristic size to the sizeof the probe, the more the histogram becomes differentiated from thesingle peak described in part 2, the tail of the distribution at lengthsshorter than the length Lc of the probe becoming greater, until itpractically causes the peak to disappear for the expected value L_(c).

B.2. Influence of the Break Level on the Histogram

FIG. 5 presents, for various characteristic sizes, the variation of thetheoretical histogram as a function of the break level, that is to saythe number of random break steps (N time steps). It appears naturallythat the peak at the expected length L_(c) decreases when the breaklevel increases.

However, this model has a more realistic behavior than the preceding onesince when the break level increases, the peak corresponding to L_(c)decreases regardless of the value of the characteristic size,disappearing completely when the break level increases indefinitely.

B.3. Advantage of the Model

The Gaussian model being more realistic, it will be the one which wewill use in the text which follows to simulate histograms of lengths ofprobe fragments hybridized.

C. Simulation of a Hybridization to a Break Point

To study the conditions for the applicability of our technique, we willpresent the simulation of three types of situations characteristic ofthe hybridization of a clone to a break point. We will take as examplethe case of a BAC (Bacterial Artificial Chromosome) of 125 kb covering abreak point involved in a translocation (cf. for example diagram 2 a(c1)).

The simulations presented in FIG. 8d correspond to a situation where theclone is hybridized to a normal genomic DNA: under these conditions,most of the hybridizations have a length of L_(c)=125 kb, apart from thefragments broken because of the manipulations. We chose two values ofthe characteristic size L_(o): L_(o)=100 (<L_(c)) and L_(c)=200(>L_(c)). The break level is unique and equal to N=10.

These histograms show that it is possible to distinguish a peak at 125kb in most cases, even when they are fairly unfavorable such as in thecase L_(o)=100, N=10.

C.1. Fragments of 35 and 90 kb

FIG. 8a presents the results of the simulation of a histogram of thelengths of fragments hybridized in the case where the BAC of 125 kbhybridizes in two non-closely related portions of the genome, of verydifferent sizes (35 and 90 kb) , as in the case described in the diagram2 a (c1)

In the two cases (L_(o)=100, L_(o)=200) , the two peaks at 35 and 90 kbare recognizable, even if the situation is considerably more favorablein the case where the characteristic size is considerably greater thanthe largest fragment.

It should be noted, moreover, that even in the case of a smaller sample,which would lead to a peak at 90 kb which is not very distinct, the peakat 35 kb would remain, indicating an abnormality relative to the normalsize of the clone.

C.2. Fragments of 50 and 75 kb

FIG. 8b represents the result of simulations in the case of twofragments whose sizes are not very different from each other. Hereagain, the corresponding peaks are well defined and different from eachother. In this case, it is possible to predict that with a smallersample, it will remain possible to identify these two peaks.

C.3. Fragments of 60 and 65 kb

FIG. 8c represents the result of simulations in the case of twofragments of very similar sizes. In both cases, the peaks stand outclearly from the histogram of the small fragments, but there is the fearthat the limited accuracy of the measurements blend them into only one.

Here again such information will however be sufficient to allow for theexistence of a break point situated midway from the hybridized BAC.

C.4. Conclusions

The preceding simulations show that, under conditions avoidingexcessively breaking the DNA before and during combing, it is possibleto unambiguously distinguish between a control situation where the probehybridizes to a normal human genome, and a situation where the probehybridizes to a genome modified by the presence of a break point.

The heuristic parameters characterizing combing show that it is howeverpreferable to have a characteristic size greater than or equal to themaximum hybridization length. The latter being unknown, a simplecriterion is therefore a characteristic size of the order of or greaterthan the length of the clone used as probe.

To judge experimentally the achievement of this criterion, it thereforeseems desirable to use, in addition to the probes intended for thesearch for an abnormality in the genome, a control probe of comparablesize which hybridizes to a different portion of the genome. Thecorresponding histogram would thus make it possible to determinequalitatively the parameters for the model reflecting the experimentalsituation.

FIG. 8e illustrates an experimental implementation of the precedingexample described in FIG. 8a. Human genomic DNA was combed according tothe protocol described below. Five contiguous cosmid probes ofchromosome 9 (280A6, 37A1, 149B8, 134A11, 99B4 belonging to the contigof the TSC1 gene) of about 155 kb in total were hybridized to this DNAand then revealed with fluorescent antibodies. A histogram of thehybridization lengths measured was plotted which is very similar to thehistogram of FIG. 8d for L_(break)=100. It is noted, moreover, that thereal histogram was obtained on an equivalent of less than 40 diploidgenomes (simulations corresponding to 100 diploid genomes).

Surface Treatment

Slides of 22×22 mm are prepared and coated with silane using the methoddescribed in patent applications PCT/FR95/00164 and PCT/FR95/00165.

Preparation of the Cosmid Probes

The cosmid DNA is labeled by extension from a random primer withnucleotides modified with digoxygenin or which are biotinylated. For theprobes labeled with biotin, the Bioprime™ DNA labeling “kit” (Gibco-BRL)containing random octameric primers and biotin-14-dCTP are used. For thelabeling with digoxigenin, a different mixture of dNTPs is used whichcontains dCTP, dATP and dGTP (0.1 mM), dTTP (0.065 mM) and dig-11-dUTP(0.033 mM).

The size and the concentration of the labeled DNA fragments areconfirmed by electrophoresis on a 0.6% agarose gel and by densitometryon bands. The labeling efficiency is evaluated using spots deposited onnylon membranes using serial dilutions of digoxigenin probes and ofbiotin probes. After incubation with an anti-dig alkaline phosphatase(AP) (Boehringer Mannheim) or streptavidin AP (Gibco-BRL), the spots arerevealed with N BT (NitroBlue Tetrazolium) and BCIP(5-bromo-4-chloro-3-indolylphosphate) from Gibco-BRL. The positive spotsare compared with the results obtained from control DNA samples labeledwith dig or with biotin.

In order to limit the background noise, the strains are finally purifiedon Bio-Spin 6 columns (Biorad) by centrifugation for 5 min at 1500 g.

Solution of Yeast DNA

Yeast DNA containing YAC 774G4 (1600 kb) is prepared in 100 μl blocks of0.8% LMP agarose (1 μg/block) using a protocol for preparing standardPFGE blocks. The blocks are stored in 0.5 M EDTA at 4° C. and rinsedwith 15 ml TE buffer (10 mM Tris/1 mM EDTA, pH 8) for 2 hours beforebeing used. Each block is stained with 3.3 μM YOYO-1 (Molecular Probes)in 100 μl T₄₀E₂ (40 mM Tris/2 mM EDTA, pH 8) for 1 hour at roomtemperature (RT) . The agarose is then melted for 45 minutes at 68° C.and digested for 2 hours at 40° C. using 2 U of β-agarase I in a 1×NEBagarase buffer (Biolabs) per block.

The dilution of the DNA (0.25 μg/ml) in 50 mM MES (pH 5.5) is carriedout with great care in order to avoid breaking the DNA strands. Thesolution is then poured into a 4 ml Teflon™ reservoir allowing theintroduction of 3 silanized 22×22 mm slides for combing. The DNAsolution may also be stored at 4° C. for several days.

Molecular Combing

The silanized slides are soaked in the Teflon™ reservoir containing asolution of total yeast DNA (0.25 g/ml in 50 mM MES, pH 5.5), incubatedat RT and extracted from the reservoir after 10 minutes of incubationusing a simple mechanical device. During the incubation, the DNAmolecules become anchored on the surface by their ends. By extractingthe surface from the reservoir, this has the same effect as theevaporation provided for in the “drop method”, the meniscus movesrelative to the surface and exerts a constant pulling force on themolecules remaining in the reservoir.

The surfaces covered with the combed DNA are then examined with anepifluorescence microscope so as to check the combing characteristics. Arecording of the representative fields of view for each slide is carriedout for the post-hybridization check.

The surfaces are then bonded to microscope slides (cyanocrylate) andheated overnight at 60° C. They can be stored for several months if theyare protected from moisture at −20° C. or at room temperature. Thesurfaces are then dehydrated before denaturation using a bath containingincreasing concentrations of ethanol (70%, 90%, 100%).

Denaturation and Hybridization

The surfaces are denatured in 70% deionized formamide/30% 2×SSC, pH 7.0)for 4 minutes at 70° C. and immediately immersed for 5 minutes in a coldethanol bath (0° C.) at increasing concentrations (70%, 90%, 100%) . Thesurfaces are then allowed to dry.

50 ng of biotin-labeled probes and 50 ng of digoxygenin-labeled probesare then mixed with 3 μg of human DNA Cot1 and 10 μg of herring spermDNA in 10 μl of hybridization buffer (50% deionized formamide/10%dextran sulfate/2×SSC/1% Tween 20, pH 7). The probes are denatured for 5minutes at 80° C. and immediately refrigerated at 0° C.

10 μl of the probe solution are added per 22×22 mm combed slide and thencovered with an untreated slide and sealed with a rubber-cement typepolymer marketed by Sanford, USA. The hybridization is carried outovernight at 37° C. in a humid chamber (HC).

Revealing with Fluorescent Probes

After hybridization, the slides are washed for 5 minutes at RT in 3baths (50% deionized formamide/2×SSC, pH 7) and for 5 minutes to 5′ atRT in 3 baths of 2×SSC. The slides are incubated for 30 minutes at 37°C. (HC) with 50 μl per slide of a blocking solution (1.5%-w/v) ofreagent (Boehringer Mannheim) in 4×SSC/0.05% Tween 20, pH 7.2).

The detections of the biotin- and digoxigenin-labeled probes are carriedout simultaneously using the same protocol for each detection layer,each antibody layer being incubated for 30 minutes at 37° C., the slidesbeing washed after each layer (3 times 5 minutes at RT in 4×SSC/0.05%Tween 20).

For the biotin-labeled probes, the following layers are used (50 μl ofhybridization buffer per slide): (1) 40 mg/ml of Avidin-Texas Red(Vector). (2) 5 mg/ml of goat anti-biotinilated avidin (Vector). (3) 40mg/ml of Avidin-Texas Red (vector).

For the digoxigenin-labeled probes: (1) 34 mg/ml of a mouse anti-digconjugate with FITC (Jackson). (2) 28 mg/ml of donkey anti-mouse-FITC(Jackson). (3) 30 mg/ml of mouse anti-rabbit-FITC (Jackson).

After detection, the slides are rapidly rinsed in 1×PBS and mounted witha reagent (Vectashield, Vector) before examination. The slides may bekept for months at 4° C. in the dark.

Example 1

In the case of the combing of human genomic DNA, simultaneoushybridizations of two cosmids separated by a gap of several tens of kb(180F1 and 50D9) are carried out: it was possible to measure about 80coupled signals (cosmid 1—gap—cosmid 2) on a single 22×22 mm coverslip,corresponding to a total distance of about 120 kb (see Example 5).

This result makes it possible to ensure that it is possible to observeabout 100 hybridizations of a whole BAC on a normal human genome. Thisexperimental situation is to be compared with the theoretical situationsenvisaged in FIG. 8: the histograms 8d indeed show that for 100 diploidgenomes (therefore 200 occurrences of a cloned sequence), and withparticular parameters, about 10 intact hybridization signals of a BAC of125 kb (case L_(o)=100 kb), or 16 intact hybridization signals (caseL_(o)=200 kb) are expected. In other words, the theoretical parametersused in the simulations are a lot more pessimistic than the experimentalconditions which it is currently possible to produce.

This theoretical and experimental analysis as a whole therefore allows afirst validation of the present invention.

In addition to the search for break points with the aid of BACs (orother probes of equivalent or smaller size) which is mentioned above, itis also possible to envisage using YACs (available for the entire humangenome, and for other genomes). Simulations similar to those presentedabove show that it is possible to envisage detecting the hybridizationof a YAC of 1600 kb in two distinct fragments of 400 and 1200 kb (forexample) on the basis of 100 combed genomes under conditionscorresponding to L_(o)=600, N=10 in the Gaussian model.

However, in contrast to the observations of BACs or of probes of asimilar or a smaller size, which may be carried out without difficultywith the aid of an epifluorescence microscope equipped with a ×100 or×63 lens and a camera (maximum size of a field of view; respectively 123μm and 195 μm approximately), the measurement of fragments of severalhundreds of kb requires the use of lenses with a lower magnification,for which problems of detectability of the signals may possibly exist:

×40: field of view of maximum size 307 μm approximately.

×20: field of view of maximum size 614 μm approximately.

The advantage of using such large probes is obviously to reduce thenumber of hybridizations necessary in order to find a clone of interest.

Example 2 Assay of the λ phage DNA in the Genome of E. coli

Genomic DNA of E. coli (1c=4.7 Mb) containing one copy of the λ phagegenome (line 5243, lt=49 kb) was combed on silanized surfaces afterhaving been counterstained with a fluorescent molecule (YOYO-1).

The total length of combed DNA per field of view was estimated from 30to 50 fields of view uniformly distributed over the whole of eachsurface.

Biotin-dUTP labeled λ phage DNA was then hybridized and revealed withthe aid of a system of antibodies coupled to FITC (green). The totallength of hybridized DNA per field of view was estimated from 100 fieldsof view uniformly distributed over the whole of each surface.

The results of 4 experiments are represented in the table below in whichNb E. coli={fraction (Lc/Ic)} and Nb={fraction (LT/It)}

Nb E. Coli Nb λ R = λ/E. Coli ΔR 43_4 0.19 0.19 1.0 0.6 43_6 0.25 0.220.9 0.5 43_7 0.31 0.26 0.8 0.4 43_8 0.20 0.24 1.2 0.7 43_9 0.26 0.39 1.50.8 43_13 0.57 0.74 1.3 0.4 43_14 0.61 0.56 0.9 0.3

These results showed, except for slide 43_(—)9, a ratio which isreasonably close to the expected value of 1. However, the principleused, consisting in measuring all the signals before hybridization, isnot very practicable for larger genomes (the error ΔR is mainly due hereto the small number of E. coli genomes measured, of the order of 6 to24).

Example 3 Assay of the Number of Amplicons in the Genome of HamsterCells

The feasibility study was extended to mammalian genomic DNA. The systemchosen is the DNA of 2 hamster lung fibroplast lines (lines 618 andGMA32) containing respectively 1 and 2 copies of the target gene AMP1per haploid genome.

Genomic DNA of cells which is provided in solution was combed onsilanized surfaces, with an estimated density of 100 diploid genomes per22×22 mm surface.

A cosmid probe (D3S1, ˜40 kb)) specific for the target gene region waslabeled with dig-dUTP. Another cosmid probe (565.5A1, ˜40 kb), specificfor a region situated at about 1 Mb from the target gene was used ascontrol, and labeled with biotin-dUTP.

The probes were hybridized to previously denatured combed genomic DNA,and were revealed in red (biotinilated probes) and green (digoxygenatedprobes) in one of the cases, and with reversed colors in the other. Thehydridization signals for each color were measured on a number of fieldsof view which is representative of the surface.

In the case of line A32, the total size obtained for probe D3S1 (1400 μmapproximately) represents 70 copies of the gene. The size obtained onthe same fields of view for the control probe (1180 μm approximately)represents, for its part, 60 haploid genomes. This measurement thereforegives a target/control ratio of 1.2±0.3 which is compatible with thepresence of one copy of the gene per haploid genome.

These experiments therefore show the feasibility of the gene assay usingsolely the hybridization signals of target and control probes, as longas a sufficient number of signals can be observed.

Example 4 Mapping in Pairs of 6 Cosmids on a YAC of 1600 bp Contained inYeast Genomic DNA

YAC 774G4 contains a human genomic DNA clone of chromosome 15 containingthe calpain gene (CANP3), whose mutation is responsible for a girdledystrophy (LGMD 2A) . In collaboration with J. Beckman's group atGénéthon (Evry), we have combed yeast genomic DNA contained in blocks ofLow Melting agarose (1 block, 1 μg of DNA per block) on silanizedsurfaces.

The six cosmids were hybridized in pairs and the measurement of theirrespective size and distance was carried out by producing histograms ofthe sizes and distances. The mean and standard deviation for the mainpeak of each histogram were extracted therefrom with the aid of specificsoftware. The standard deviation of the measurements is found to be ofthe order of 2 to 4 kb (FIG. 11).

Example 5 Mapping of Pairs of Cosmids on Human Genomic DNA

4 cosmids belonging to 3 adjacent contigs separated by gaps were used in2 series of hybridizations, carried out in collaboration with S. Povey'sgroup at MRC (London). The contigs cover the region of the TSC 1 geneinvolved in one of the forms of tuberous sclerosis (chromosome 9). Thecosmid probes were prepared according to the protocol above.

The genomic DNA used was extracted from cell cultures and placed in ablock of Low Melting agarose at the rate of 10⁶ cells per block. 3blocks treated according to the preceding protocol (reservoir of 4 ml,use of final molarity of MES pH 5.5 of 150 mM), were used for thecombing of the genomic DNA.

The cosmids were hybridized in pairs on similar slides, a total ofseveral tens of double signals (red/green aligned) per slide beingobserved on average. The same measurement protocol as above wasobserved, giving rise to final values having the same accuracy as in thepreceding experiment.

FIG. 12 presents a few typical images which allowed the measurement ofthe sizes of the 2 gaps studied.

Example 6 Mapping of Restriction Segments

This mapping technique is naturally applicable to any other type ofcombed DNA or subclone. A possible extension of the technique wouldconsist, for example, in no longer mapping subclones of a clone, butdirectly mapping restriction fragments of the combed DNA (for example aclone of the BAC type).

This would avoid the production of intermediate subclones beforesequencing, the physical map of the restriction fragments being obtainedwith sufficient accuracy to allow a reconstitution of the finalsequence. The applicability of this technique rests on a good separationof the restriction bands, and adequate size (>10 kb) of the mainfragments, as well as on the subsequent subcloning of the DNA of thesebands (subcloning into small-sized vectors, after additional enzymaticrestriction, for sequencing).

What is claimed is:
 1. A method of determining a copy number of anucleotide sequence of interest in genomic DNA, comprising: (a)providing a surface on which genomic DNA has been aligned using amolecular combing technique; (b) contacting the aligned genomic DNA witha first labeled probe of length (lt), wherein the first labeled probe isspecific for a control genomic sequence whose copy number is known, andwith a second labeled probe of length (lc), wherein the second labeledprobe is specific for the nucleotide sequence of interest; (c) detectinghybridization signals between the first labeled probe and the controlgenomic sequence and between the second labeled probe and the nucleotidesequence of interest; (d) measuring the total length of thehybridization signals that correspond to the first labeled probe (Lt)and the second labeled probe (Lc); (e) calculating the copy number ofthe nucleotide sequence of interest using the ratios: Nt=Lt/lt andNc=Lc/lc,  wherein Nt corresponds to the copy number of the controlgenomic sequence and Nc corresponds to the copy number of the sequenceof interest.
 2. The method of claim 1, wherein the copy number of thecontrol genomic sequence is a whole number and a significant differencebetween Nc and Nt indicates a genetic abnormality in the genome.
 3. Themethod of claim 1, wherein the control genomic sequence comprisesseparate portions whose total length (lt) per genome is known, whereinthe genomic nucleotide sequence of interest comprises separate portionswhose length (lc) per normal gene is known, and wherein a significantdifference between Nc and Nt indicates a genetic abnormality in thegenome.
 4. The method of claim 1, wherein the copy number of the controlgenomic sequence is two and a significant difference between Nc and Ntindicates a genetic abnormality in the genome.
 5. The method of claim 1,wherein the nucleotide sequence of interest is from a trisomy-linkedchromosome, the control genomic sequence is from a chromosome other thanthe trisomy-linked chromosome, and a Nc/Nt ratio of approximately 1.5indicates a trisomic genotype.
 6. The method of claim 1, wherein thenucleotide sequence of interest comprises a deletion of a portion of thegenome.
 7. The method of 1, wherein the nucleotide sequence of interestcomprises a repeating sequence.
 8. The method of claim 1, wherein thecontrol genomic sequence and the nucleotide sequence of interest areidentical for a given genome, and wherein several different genomes arecombed on the surface and the respective quantities of each genome aredetermined.
 9. The method of claim 1, wherein the Nc/Nt ratio has astatistical error of less than 20%.
 10. The method of claim 6, whereinthe control genomic sequence is from the same chromosome as thenucleotide sequence of interest.
 11. The method of claim 7, wherein thecontrol genomic sequence is from the same chromosome as the nucleotidesequence of interest.